1. PREPARE

Our first SNA case study is guided by the work of Matthew Pittinsky and Brian V. Carolan (2008), which employed a social network perspective to examine teachers perceptions of student friendships agreed with their own. Sadly, this excellent study did not include any visual depictions comparing student and teacher perceived friendship networks, but we are going to fix that!

Our primary aim for this case study is to gain some hands-on experience with essential R packages and functions for preparing network data for analysis and creating a simple network sociogram to help describe visually what our network “looks like.” Specifically, this case study will cover the following topics pertaining to each data-intensive workflow process (Krumm, Means, and Bienkowski 2018):

  1. Prepare: Prior to analysis, we’ll take a look at the context from which our data came, formulate some research questions, and get introduced the {tidygraph} and {ggraph} packages for analyzing and visualizing relational data.

  2. Wrangle: In the wrangling section of our case study, we will learn some basic techniques for manipulating, cleaning, transforming, and merging network data.

  3. Explore: With our network data tidied, we learn to calculate some key network measures and to illustrate some of these stats through network visualization.

  4. Model: We conclude our analysis by introducing community detection algorithms for identifying groups and revisiting sentiment about the common core.

  5. Communicate: We develop a polished sociogram to highlight key findings.

1a. Review the Research

Pittinsky, M., & Carolan, B. V. (2008). Behavioral versus cognitive classroom friendship networks. Social Psychology of Education11(2), 133-147.

Abstract

Researchers of social networks commonly distinguish between “behavioral” and “cognitive” social structure. In a school context, for example, a teacher’s perceptions of student friendship ties, not necessarily actual friendship relations, may influence teacher behavior. Revisiting early work in the field of sociometry, this study assesses the level of agreement between teacher perceptions and student reports of within-classroom friendship ties. Using data from one middle school teacher and four classes of students, the study explores new ground by assessing agreement over time and across classroom social contexts, with the teacher-perceiver held constant. While the teacher’s perceptions and students’ reports were statistically similar, 11–29% of possible ties did not match. In particular, students reported significantly more reciprocated friendship ties than the teacher perceived. Interestingly, the observed level of agreement varied across classes and generally increased over time. This study further demonstrates that significant error can be introduced by conflating teacher per- ceptions and student reports. Findings reinforce the importance of treating behavioral and cognitive classroom friendship networks as distinct, and analyzing social structure data that are carefully aligned with the social process hypothesized.

Research Questions

The central question guiding this investigation was:

Do student reports agree with teacher perceptions when it comes to classroom friendship ties and with what consequences for commonly used social network measures?

We will be using this question to guide our own analysis of the classroom friendships reported by teachers. Specifically, we will use the first part of this question to guide our analysis and develop two sociograms to help visually compare similarities and differences between teacher and student reported classroom friendships.

Data Collection

To measure the level of agreement between student and teacher reports of classroom student friendships, sociometric data were collected from each student in all four classes and the teacher provided similar reports on all students. To collect student reports of friendships, students were given a class roster and asked to describe their relationship with each student in the class. Choices included best friend, friend, know-like, know, know-dislike, strongly dislike, and do not know. In the terminology of network analysis, these sociometric data are “valued” (degrees of friendship, not just yes or no) and “directed” (friendship nominations were not presumed to be reciprocal). Data were collected in the autumn and spring. All “best friend” and “friend” choices are coded as ‘1’ (friend), while all other choices are coded as ‘0’ (not friend). The teacher’s reports of students’ friendships were generated in a similar manner.

Analyses

To assess agreement between perceived friendship by the teacher and students, QAP (quadratic assignment procedure) correlations for each class’s two matrices (teacher and student generated) were analyzed in the autumn andspring. A QAP correlation is used to calculate the degree of association between two sets of relations; it tests whether the probability of dyad overlap in the teacher matrix is correlated with the probability of dyad overlap in the student matrix. It does so by running a large number of simulations. These simulations generate random matrices with sizes and value distributions based on the original two matrices being tested. It then computes an average level of correlation between the matrices that would be expected at random. Similarly, it calculates the probability that the observed degree of correlation between two matrices would be as large or as small as that observed based on the range of correlations generated in the random permutations, with an associated significance statistic.

Key Findings

As reported by Pittinsky and Carolan (2008) in their findings section:

While the teacher’s perceptions and students’ reports were statistically similar, 11–29% of possible ties did not match. In particular, students reported significantly more reciprocated friendship ties than the teacher perceived.

👉 Your Turn

Take a look at the paper in our essential readings repository on GitHub and highlight one or two findings and/or conclusions you found especially interesting.

1b. Identify a Question(s)

Recall from above that the central question guiding the #COMMONCORE Project was:

How are social media-enabled social networks changing the discourse in American politics that produces and sustains education policy?

For Unit 4, we are going to focus our questions on something a bit less ambitious but inspired by this work:

  1. Who are the transmitters, transceivers, and transcenders in our Common Core Twitter network?
  2. What subgroups, or factions, exist in our network?
  3. Which actors in our network tend to be more opposed to the Common Core?

To address the last question, we’ll revisit our techniques we learned from our Unit 3 VADER sentiment analysis.

👉 Your Turn

Based on what you know about networks and the context so far, what other research question(s) might ask we ask in this context that a social network perspective might be able to answer?

In the space below, type a brief response to the following questions:

  • YOUR RESPONSE HERE

1c. Load Packages

As highlighted in Chapter 6 of Data Science in Education Using R (DSIEUR), one of the first steps of every workflow should be to set up your “Project” within RStudio. Recall that:

A Project is the home for all of the files, images, reports, and code that are used in any given project

Since we are working from an R project cloned from GitHub, a Project has already been set up for you as indicated by the .Rproj file in your main directory in the Files pane. Instead, we will focus on getting our project set up withe the requisite packages we’ll need for analysis.

Packages, or sometimes called libraries, are shareable collections of R code that can contain functions, data, and/or documentation and extend the functionality of R. You can always check to see which packages have already been installed and loaded into RStudio Cloud by looking at the the Files, Plots, & Packages Pane in the lower right hand corner.

tidyverse 📦

One package that we’ll be using extensively is {tidyverse}. Recall from earlier tutorials that the {tidyverse} package is actually a collection of R packages designed for reading, wrangling, and exploring data and which all share an underlying design philosophy, grammar, and data structures. This shared features are sometimes “tidy data principles.”

Click the green arrow in the right corner of the “code chunk” that follows to load the {tidyverse} library as well as the {here} package introduced in previous labs.

library(tidyverse)

Don’t worry if you saw a number of messages: those probably mean that the tidyverse loaded just fine. Any conflicts you may have seen mean that functions in these packages you loaded have the same name as functions in other packages and R will default to function from the last loaded package unless

Next, we will introduce two new packages extend the tidyverse suite of packages and that we will use throughout SNA Learning Labs 1-4.

New Packages

tidygraph 📦

The {tidygraph} package is a huge package that exports 280 different functions and methods, including access to almost all of the dplyr verbs plus a few more, developed for use with relational data. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data.

The {tidygraph} package provides a way to switch between the two tables and uses dplyr verbs to manipulate them. Furthermore it provides access to a lot of graph algorithms with return values that facilitate their use in a tidy workflow.

ggraph 📦

Created by the same developer as {tidygraph}, {ggraph} – pronounced gg-raph or g-giraffe hence the logo – is an extension of {ggplot} aimed at supporting relational data structures such as networks, graphs, and trees. Both packages are more modern and widely adopted approaches data visualization in R.

While ggraph builds upon the foundation of ggplot and its API, it comes with its own self-contained set of geoms, facets, etc., as well as adding the concept of layouts to the grammar of graphics, i.e. the “gg” in ggplot and ggraph.

readxl 📦

The {readxl} package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems. It is designed to work with tabular data.

Since one of our data wrangling steps in the next section is importing network matrices stored in excel files, this package will come in handy.

👉 Your Turn

Use the code chunk below load the {tidygraph} and {ggraph} packages:

# YOUR CODE HERE
library(tidygraph)
library(ggraph)
library(readxl)

2. WRANGLE

In general, data wrangling involves some combination of cleaning, reshaping, transforming, and merging data (Wickham and Grolemund 2016). As highlighted in Estrellado et al. (2020), wrangling network data can be even more challenging than other data sources since network data often includes variables about both individuals and their relationships.

For our data wrangling in lab, we’re keeping it simple since working with relational data is a bit of a departure from our working with rectangular data frames. Our primary goals for Lab 1 are learning how to:

  1. Import Data. In this section, we learn about the read_csv() function for importing a data stored in a format in a unique two common formats for storing network data: edgelists and nodelists.

  2. Create a Network Object. Before we can create our sociogram, we’ll first need to convert our data frames into special data format, an R network object, for working with relational data.

2a. Import Data

One of our primary goals for this case study to is create . To do so, we’ll need to import two Excel files originally obtained from the Social Network Analysis and Education companion site. Both files contain edges stored as a square matrix (more on this later) for the first and third year of a study examining the impact of national reform efforts.

These files are included in the lab-1/data folder of your R Studio project. A description of each file from the companion website is copied below along with a link to the original file:

  1. 99472_ds3.xlsxThis adjacency matrix consists of student-reported friendship relations among 27 students in one class in the fall semester. These data are directed and unweighted; a friendship tie is present if the student reported that another was either a best friend or friend.

  2. 99472_ds5.xlsxThis adjacency matrix consists of the teacher-reported friendship relations among 27 students in one class in the fall semester. These data are directed and unweighted; a friendship tie is present if the teacher reported that students were either a best friend or friend.

Recall from above that our relations, or edges, are stored as a valued adjacency matrix in which columns and rows consist of the same actors and each cell contains information about the tie between each pair of actors. In our case, the tie is a directed and valued “arc” where the value indicates the frequency of collaboration.

Let’s use the read_excel() function to import the student-reported-friends.xlsx file, add an argument setting the column names to FALSE since our file is a simple matrix with no header or column names, and assign the matrix to a variable named student_friends:

R Studio Tip: Type ?read_excel into the console and check the arguments section to examine the different arguments that can be used with this function.

student_friends <- read_excel("data/student-reported-friends.xlsx", 
                              col_names = FALSE)

Before importing our teacher reported friendship file, let’s quickly inspect the student_friends R object we just imported to see what we’ll be working with.

student_friends
## # A tibble: 27 × 27
##     ...1  ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11 ...12 ...13
##    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     1     0     1     1     1     1     1     1     0     0     1     0
##  2     1     0     0     0     1     0     0     0     0     1     1     0     0
##  3     1     0     0     1     0     0     0     1     0     1     0     0     0
##  4     1     0     0     0     0     0     0     0     0     0     0     1     0
##  5     1     1     0     1     0     1     1     1     1     0     1     1     1
##  6     1     0     0     0     1     0     0     0     1     0     1     1     1
##  7     1     0     1     1     0     0     0     0     1     0     0     0     1
##  8     1     0     1     1     1     0     1     0     1     1     1     0     1
##  9     1     0     0     0     0     1     1     0     0     0     1     0     1
## 10     1     1     1     1     1     0     1     1     0     0     1     1     1
## # … with 17 more rows, and 14 more variables: ...14 <dbl>, ...15 <dbl>,
## #   ...16 <dbl>, ...17 <dbl>, ...18 <dbl>, ...19 <dbl>, ...20 <dbl>,
## #   ...21 <dbl>, ...22 <dbl>, ...23 <dbl>, ...24 <dbl>, ...25 <dbl>,
## #   ...26 <dbl>, ...27 <dbl>

As you can see, we have a 27 x 27 “tibble” or data table representing our collaboration ties. Unfortunately, this data is stored in such a simple format, we have no way to easily identify who is friends who since our data is missing names or some kind of identifier for students in our network.

R has packages for creating random names to help anonymize data, but to keep things simple, we’ll just assign the numbers 1-27 as names for our rows and columns.

rownames(student_friends) <- 1:27

colnames(student_friends) <- 1:27

You may have seen a warning stating: Setting row names on a tibble is deprecated. You can ignore that for now but it’s basically telling us these functions are old we we need to use newer function or our code will some day stop working.

Again, let quickly inspect our student_friends data table to see if this worked:

student_friends
## # A tibble: 27 × 27
##      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`
##  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     1     0     1     1     1     1     1     1     0     0     1     0
##  2     1     0     0     0     1     0     0     0     0     1     1     0     0
##  3     1     0     0     1     0     0     0     1     0     1     0     0     0
##  4     1     0     0     0     0     0     0     0     0     0     0     1     0
##  5     1     1     0     1     0     1     1     1     1     0     1     1     1
##  6     1     0     0     0     1     0     0     0     1     0     1     1     1
##  7     1     0     1     1     0     0     0     0     1     0     0     0     1
##  8     1     0     1     1     1     0     1     0     1     1     1     0     1
##  9     1     0     0     0     0     1     1     0     0     0     1     0     1
## 10     1     1     1     1     1     0     1     1     0     0     1     1     1
## # … with 17 more rows, and 14 more variables: `14` <dbl>, `15` <dbl>,
## #   `16` <dbl>, `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
## #   `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>, `26` <dbl>, `27` <dbl>

Much better! Now we can see that student 1 indicated that student 2 is their friend, an student 2 indicated that student 1 is their friend, so we can say that this friendship is “reciprocated.”

👉 Your Turn

Complete the code chunk below to import the student-reported-friends.xlsx file,

# YOUR CODE HERE
teacher_friends <- read_excel("data/teacher-reported-friends.xlsx", 
                              col_names = FALSE)

rownames(teacher_friends) <- 1:27

colnames(teacher_friends) <- 1:27

teacher_friends
## # A tibble: 27 × 27
##      `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`  `12`  `13`
##  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1     0     0     0     1     0     0     0     0     0     0     0     1     0
##  2     0     0     1     0     0     0     0     1     0     1     0     0     0
##  3     0     1     0     0     0     0     0     1     0     1     0     0     0
##  4     0     0     0     0     0     0     0     0     0     0     0     0     0
##  5     1     0     0     0     0     0     0     0     0     0     0     1     1
##  6     0     0     0     0     0     0     0     0     0     0     0     0     1
##  7     0     0     0     0     0     0     0     0     0     0     0     0     1
##  8     0     1     1     0     0     0     0     0     0     1     0     0     0
##  9     0     0     0     0     0     0     0     0     0     0     1     0     0
## 10     0     1     1     0     0     0     0     1     0     0     0     0     0
## # … with 17 more rows, and 14 more variables: `14` <dbl>, `15` <dbl>,
## #   `16` <dbl>, `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
## #   `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>, `26` <dbl>, `27` <dbl>

2b. Make a Tidy Graph

Before we can begin exploring our data through through network visualization, we must first restructure our “tibble” into a formal matrix object and then convert to a network class R object required by the {tidygraph} and {ggraph} packages.

Convert to Matrix

Now that we have names included for our rows and columns, we need to convert our data table, or tibble, to a formal matrix class object. To do so is relatively simple using the as.matrix() function built into R.

student_matrix <- as.matrix(student_friends)

The word “class” and “object” have been used quite a bit in this case-study and warrant a brief explanation. Classes and objects are basic concepts of Object-Oriented Programming environments like R. An object is simply a data structure that has some methods and attributes. Everything in R is essentially an object. A class is just a blueprint or a sketch of these objects. It represents the set of properties or methods that are common to all objects of one type.

Let’s use the class() function on the student_friends and student_matrix to see the types of objects we just created:

class(student_friends)
## [1] "tbl_df"     "tbl"        "data.frame"
class(student_matrix)
## [1] "matrix" "array"

Great! We can now see that our student_matrix is formally an object of the “matrix” class.

Convert to Graph Object

Our final step before we’re able to begin exploring our data is to convert our matrix to a network object recognized by the {tidygraph} and {ggraph} packages. The as_tbl_graph() function can easily convert relational data from all common network data formats such as matrices, network, phylo, dendrogram, data.tree, graph, etc. 

Run the following code to convert our matrix to directed network graph and save as a new object called student_network: and include the argument directed = TRUE in our as_tbl_graph() function since our network is directed.

student_network <- as_tbl_graph(student_matrix, directed = TRUE)

Now let’s take a quick look at our new student_network object:

student_network
## # A tbl_graph: 27 nodes and 203 edges
## #
## # A directed simple graph with 2 components
## #
## # Node Data: 27 × 1 (active)
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## 4 4    
## 5 5    
## 6 6    
## # … with 21 more rows
## #
## # Edge Data: 203 × 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     2      1
## 2     1     4      1
## 3     1     5      1
## # … with 200 more rows

As you can see, our student_network object provides a range of information about out network including network size, type, number of components, and a preview of the node and edge lists that it created. The node and edge lists are treated just like a typical data frame and can now be used with other tidyverse packages and functions to create new actor-level network variables like degree, reciprocity, and centrality measures.

What is an edge list?

We’ll learn more about edgelists in Lab 3, but the edgelist format is very commonly used in network analysis but is slightly different than other formats you have likely worked with before. Specifically, the values in the first two columns of each row represent a dyad, or tie between two nodes in a network. An edge-list can also contain other information regarding the strength, duration, or frequency of the relationship, sometime called weight, in addition to other “edge attributes.”

In directed networks like ours, the first column indicates that student 1 indicated students 2, 4, and 5 are friends. Since our network is unweighted, the 1 for “weight” just indicated that a friendship was present.

👉 Your Turn

Complete the code chunk below to convert your teacher_friends object first to a matrix and then to a network object:

# YOUR CODE HERE
teacher_matrix <- as.matrix(teacher_friends)

teacher_network <- as_tbl_graph(teacher_matrix, directed = TRUE)

teacher_network
## # A tbl_graph: 27 nodes and 69 edges
## #
## # A directed simple graph with 6 components
## #
## # Node Data: 27 × 1 (active)
##   name 
##   <chr>
## 1 1    
## 2 2    
## 3 3    
## 4 4    
## 5 5    
## 6 6    
## # … with 21 more rows
## #
## # Edge Data: 69 × 3
##    from    to weight
##   <int> <int>  <dbl>
## 1     1     4      1
## 2     1    12      1
## 3     1    27      1
## # … with 66 more rows

Now answer the questions that following questions:

  1. How many students are in our network?
    • YOUR RESPONSE HERE
  2. Who reported more friendships, teachers or students? How do you know?
    • YOUR RESPONSE HERE

3. EXPLORE

As noted in in our course readings, exploratory data analysis involves the processes of describing your data (such as by calculating the means and standard deviations of numeric variables, or counting the frequency of categorical variables) and, often, visualizing your data prior to modeling.

In Section 3, we use the {tidygraph} package for retrieving network descriptives and introduce the {ggraph} package to create a network visualization to help illustrate these metrics. Specifically, in this section we’ll learn to:

  1. Plot Basics. We focus primarily on actors and edges in this walkthrough, including the edges wights we added in the previous section as well as node degree, and import and fairly intuitive measure of centrality.

  2. Adding Nodes.

  3. Adding Edges. Finally, we wrap up the explore phases by learning to plot a network and tweak key elements like the size, shape, and position of nodes and edges to better at communicating key findings.

One of the defining characteristics of the social network perspective is its use of graphic imagery to represent actors and their relations with one another. To emphasize this point, Carolan (2014) reported that:

The visualization of social networks has been a core practice since its foundation more than 100 years ago and remains a hallmark of contemporary social network analysis. 

Network visualization can be used for a variety of purposes, ranging from highlighting key actors to even serving as works of art.

This excellent figure from Katya Ognyanova’s also excellent tutorial on Static and Dynamic Network Visualization with R helps illustrate the variety of goals a good network visualization can accomplish:

3a. Simple Sociograms

These visual representations of the actors and their relations, i.e. the network, are called a sociogram. Actors who are most central to the network, such as those with higher node degrees, are usually placed in the center of the sociogram and their ties are placed near them. As we’ll see in just a bit, those two actors with hundreds of ties will be placed by most graph layout algorithms in the center of the graph.

The plot() function from R’s built in {graphics} package can be used to make a wide range of graphs, including sociograms, but as you’ll see it’s a bit lacking and is limited limited in the level of customization allowed.

In the code chunk below, use the plot() function with your ccss_network object to see what the basic plot function produces:

plot(student_network)

Not super great. In fact, it’s visualizations like these that give sociograms the unflattering nickname of “hair ball” plots!

If this had been a smaller network this might have been a little more useful but one important insight is that we have already identified an “isolate” in our network, that is a student who neither named others as a friend or was named by others as a friend.

Fortunately, the {ggraph} package includes a plethora of plotting parameters for graph layouts, edges and nodes to improve the visual design of network graphs.

Let’s first take a quick look the auto_graph() function for making quick and simple sociograms.

autograph(student_network)

A little better, but also lacking in many important ways. Like the plot() function, it does allow some small degree of customization, but is still rather limited and best use for very quick sociograms to get a quick feel for the data.

Run the following code chunk to see some additional arguments you can add to the autograph() function:

autograph(student_network,
          node_size = local_size(),
          node_label = name,
          node_colour = local_size())

👉 Your Turn

Use the code chunk below to try out these simple sociogram functions on your teacher_network object you created above:

# YOUR CODE HERE
plot(teacher_network)

autograph(teacher_network,
          node_size = local_size(),
          node_colour = local_size())

3b. Sophisticated Sociograms

One thing to keep in mind when building a network viz with {ggraph}, is that just like it’s ggplot() counterpart, the ggraph() function is the first function required and takes care of setting up the plot object along with creating the layout for the plot based on the network object and the layout specification provided.

Let’s first pass our student_network object to ggraph() and see what happens.

ggraph(student_network)

Wow, that was unimpressive. But don’t worry, just like the ggplot() function, this didn’t produce much on it’s own. All that the ggraph() function does is set up the network object to make a sociogram, and creates a layout for our network, in this case using the default “stress” layout.

Add Nodes

Very similar to how ggplot() uses the + operator to “layer” functions together to progressively build more sophisticated graphs, ggraph use the + operator progressively build a sociogram.

To add our nodes, we’ll added the geom_node_point() function. Again, just like with {ggplot2}, the “geom” in the geom_non_point() functions stands for “Geometric elements”, or geoms for short, and represents what you actually see in the plot.

👉 Your Turn

Now “add” the geom_node_point() function to our code using the + operator:

ggraph(student_network) + 
  geom_node_point() 

Well, at least we have our nodes now! But the default “stress” layout for our sociogram is not so great. Let’s fix that.

Add Layout

One of the major advances in visualization since the first hand-drawn sociograms developed by Jacob Moreno (1934) to represent relations among children in school is the use of software and algorithms to automatically layout networks on a grid.

There are may different layout methods, but we’ll start with the Fruchterman-Reingold (FR) layout, which is one of the most used layout algorithms for network visualization. These types of force-directed algorithms generally work well with large networks and try to layout graphs in “an aesthetically-pleasing way” by making edges roughly equal in length and minimizing overlap.

Let’s go ahead and include the layout argument, which in addition to including its own unique layouts, can incorporate layouts form {igraph} package like fr for the Fruchterman-Reingold (FR) layout:

ggraph(student_network, layout = "fr") +
  geom_node_point()

That’s not much better so let’s stick with the “stress” layout for now. Feel free to try out some other ggraph layout methods if you like, however.

Tweak Nodes

Also like {ggplot2}, geoms can include aesthetics, or aes for short, such as alpha for transparency, as well as color, shape and size.

Let’s now add some “aesthetics” to our points by including the aes() function and arguments such as size = and color =, which set using local_size() function to help highlight the number of friends students have:

ggraph(student_network, layout = "stress") + 
geom_node_point(aes(size = local_size(),
                    color = local_size()))

We can easily see that the number of friends ranges from 5 to 20, with the exception of one “isolated” student we identified earlier who is not connected to any other students in the network.

Let’s fix that by adding another layer with some node text and labels. Since node labels are a geometric element, we can apply aesthetics to them as well, like color and size. Let’s also include the repel = argument that when set to TRUE will avoid overlapping text.

ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name,
                     size = local_size()), # note we can treat this like a number
                 repel=TRUE)

Add Edges

Now, let’s literally connect the dots and add some edges using the geom_edge_link() function.

ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link()

Ack! Without some adjustment, the edges make it really difficult to see the nodes. Fortunately, you can also adjust the edges just like we did to the nodes above: Let’s now include the following arguments:

  • arrow = to include some arrows 1mm in length

  • end_cap = around each node to keep arrows from overlapping the them, and to

  • alpha = .2 set the transparency of our edges so our edges fade more into the background and help keep the focus on our nodes:

ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2)

Add a Theme

Finally, let’s add a theme, which controls the finer points of display, like the font size and background color. The theme_graph() function add a theme specially tuned for graph visualizations. This function removes redundant elements in order to put focus on the data and if you type ?theme_graph in the console you will get a sense of the level of fine tuning you can do if desired.

Let’s add theme_graph() to our sociogram, remove the legends since they are not especially useful, and call it good for now:

ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) +
  theme_graph()

Much better!

Note: If you’re having difficulty seeing the sociogram in the small R Markdown code chunk, you can copy and paste the code in the console and it will show in the Viewer pan and then you can enlarge and even save as an image file.

👉 Your Turn

Now that you have a sense of how the {ggraph} package works to build network graphs, use the code chunk below and try building sophisticated sociogram for the teacher_network object that you created above.

There are no right or wrong answers, just have some fun trying out different options for graph layouts, edges and nodes and see if you can build something that is visually pleasing to you.

ggraph(teacher_network, layout = "fr") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) +
  theme_graph()

Congrats! You made it to the end of the EXPLORE section!


4. MODEL

As highlighted in Chapter 3 of Data Science in Education Using R, the Model step of the data science process entails “using statistical models, from simple to complex, to understand trends and patterns in the data.” We will not explore the use of models for SNA until Lab 4, but recall from the PREPARE section that to assess agreement between perceived friendships by the teacher and students, (Pittinsky and Carolan 2008) note that:

The QAP (quadratic assignment procedure) [is] used to calculate the degree of association between two sets of relations and tests whether the probability of dyad overlap in the teacher matrix is correlated with the probability of dyad overlap in the student matrix. It does so by running a large number of simulations. These simulations generate random matrices with sizes and value distributions based on the original two matrices being tested.

We will learn more about the QAP and other models for statistical inference when working with relational data in Learning Lab 4.


5. COMMUNICATE

Our goal is to distill the analysis from above into a simple “data product” designed to illustrate key findings about changes in the collaboration network over time. For the purposes of this task, imagine that your audience consists of teachers and school leaders who have limited background in SNA and adapt the following steps accordingly:

  1. Select. Select our sociogram from above, or create a entirely new sociogram if so motivated, that you think would be interesting or relevant for the target audience and that helps answer our research question.

  2. Polish. Create a visually attractive sociogram to help illustrate similarities and differences in classroom friendships reported by teachers and students.

  3. Narrate. Write a brief narrative to accompany your visualization and/or table that includes the following:

    • The question or questions guiding the analysis;

    • The conclusions you’ve reached based on our findings;

    • How your audience might use this information;

    • How you might revisit or improve upon this analysis in the future.

👉 Your Turn ⤵

Use the code chunk below create a polished table and/or visualization(s) and write a brief narrative in the space that follows.

Data Visualization or Table

# YOUR CODE HERE

Narrative

NARRATIVE GOES HERE…

🧶 Knit & Check ✅

Congratulations - you’ve completed the Lab 4 case study! One final step is to “Knit” your document by clicking the drop down arrow next to the ball of yarn in the menu bar an that the top of this markdown file, and then selecting “Knit top HTML” or another preferred output format. This will do two things: 1) it will check through all your code for any errors, 2) it will created a file in your directory that you can use to share you work through GitHub Pages, RPubs, or any other preferred means.

References

Carolan, Brian. 2014. “Social Network Analysis and Education: Theory, Methods & Applications.” https://doi.org/10.4135/9781452270104.
Estrellado, Ryan A., Emily A. Freer, Jesse Mostipak, Joshua M. Rosenberg, and Isabella C. Velásquez. 2020. Data Science in Education Using r. Routledge. https://doi.org/10.4324/9780367822842.
Krumm, Andrew, Barbara Means, and Marie Bienkowski. 2018. Learning Analytics Goes to School. Routledge. https://doi.org/10.4324/9781315650722.
Pittinsky, Matthew, and Brian V Carolan. 2008. “Behavioral Versus Cognitive Classroom Friendship Networks.” Social Psychology of Education 11 (2): 133–47.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. " O’Reilly Media, Inc.". https://r4ds.had.co.nz.
---
title: "Who's Friends with Who in Middle School"
subtitle: "LASER Institute Learning Lab 1 Case Study"
author: "Dr. Shaun Kellogg"
date: "`r format(Sys.Date(),'%B %e, %Y')`"
output:
  html_document:
    toc: yes
    toc_depth: 4
    toc_float: yes
    code_folding: show
    code_download: TRUE
editor_options:
  markdown:
    wrap: 72
bibliography: references.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```

## 1. PREPARE

Our first SNA case study is guided by the work of Matthew Pittinsky and
Brian V. Carolan (2008), which employed a social network perspective to
examine teachers perceptions of student friendships agreed with their
own. Sadly, this excellent study did not include any visual depictions
comparing student and teacher perceived friendship networks, but we are
going to fix that!

Our primary aim for this case study is to gain some hands-on experience
with essential R packages and functions for preparing network data for
analysis and creating a simple network sociogram to help describe
visually what our network "looks like." Specifically, this case study
will cover the following topics pertaining to each data-intensive
workflow process [@krumm2018]:

1.  **Prepare**: Prior to analysis, we'll take a look at the context
    from which our data came, formulate some research questions, and get
    introduced the {tidygraph} and {ggraph} packages for analyzing and
    visualizing relational data.

2.  **Wrangle**: In the wrangling section of our case study, we will
    learn some basic techniques for manipulating, cleaning,
    transforming, and merging network data.

3.  **Explore**: With our network data tidied, we learn to calculate
    some key network measures and to illustrate some of these stats
    through network visualization.

4.  **Model**: We conclude our analysis by introducing community
    detection algorithms for identifying groups and revisiting sentiment
    about the common core.

5.  **Communicate**: We develop a polished sociogram to highlight key
    findings.

### 1a. Review the Research

![](img/pittinsky-carolan.png){width="50%"}

Pittinsky, M., & Carolan, B. V. (2008). Behavioral versus cognitive
classroom friendship networks. *Social Psychology of
Education*, *11*(2), 133-147.

#### Abstract

Researchers of social networks commonly distinguish between "behavioral"
and "cognitive" social structure. In a school context, for example, a
teacher's perceptions of student friendship ties, not necessarily actual
friendship relations, may influence teacher behavior. Revisiting early
work in the field of sociometry, this study assesses the level of
agreement between teacher perceptions and student reports of
within-classroom friendship ties. Using data from one middle school
teacher and four classes of students, the study explores new ground by
assessing agreement over time and across classroom social contexts, with
the teacher-perceiver held constant. While the teacher's perceptions and
students' reports were statistically similar, 11--29% of possible ties
did not match. In particular, students reported significantly more
reciprocated friendship ties than the teacher perceived. Interestingly,
the observed level of agreement varied across classes and generally
increased over time. This study further demonstrates that significant
error can be introduced by conflating teacher per- ceptions and student
reports. Findings reinforce the importance of treating behavioral and
cognitive classroom friendship networks as distinct, and analyzing
social structure data that are carefully aligned with the social process
hypothesized.

#### Research Questions

The central question guiding this investigation was:

> Do student reports agree with teacher perceptions when it comes to
> classroom friendship ties and with what consequences for commonly used
> social network measures?

We will be using this question to guide our own analysis of the
classroom friendships reported by teachers. Specifically, we will use
the first part of this question to guide our analysis and develop two
sociograms to help visually compare similarities and differences between
teacher and student reported classroom friendships.

#### Data Collection

To measure the level of agreement between student and teacher reports of
classroom student friendships, sociometric data were collected from each
student in all four classes and the teacher provided similar reports on
all students. To collect student reports of friendships, students were
given a class roster and asked to describe their relationship with each
student in the class. Choices included best friend, friend, know-like,
know, know-dislike, strongly dislike, and do not know. In the
terminology of network analysis, these sociometric data are "valued"
(degrees of friendship, not just yes or no) and "directed" (friendship
nominations were not presumed to be reciprocal). Data were collected in
the autumn and spring. All "best friend" and "friend" choices are coded
as '1' (friend), while all other choices are coded as '0' (not friend).
The teacher's reports of students' friendships were generated in a
similar manner.

#### Analyses		

To assess agreement between perceived friendship by the teacher and
students, QAP (quadratic assignment procedure) correlations for each
class's two matrices (teacher and student generated) were analyzed in
the autumn andspring. A QAP correlation is used to calculate the degree
of association between two sets of relations; it tests whether the
probability of dyad overlap in the teacher matrix is correlated with the
probability of dyad overlap in the student matrix. It does so by running
a large number of simulations. These simulations generate random
matrices with sizes and value distributions based on the original two
matrices being tested. It then computes an average level of correlation
between the matrices that would be expected at random. Similarly, it
calculates the probability that the observed degree of correlation
between two matrices would be as large or as small as that observed
based on the range of correlations generated in the random permutations,
with an associated significance statistic.

#### Key Findings

As reported by @pittinsky2008behavioral in their findings section:

> While the teacher's perceptions and students' reports were
> statistically similar, 11--29% of possible ties did not match. In
> particular, students reported significantly more reciprocated
> friendship ties than the teacher perceived.

#### **👉 Your Turn** **⤵**

Take a look at the paper in our essential readings repository on GitHub
and highlight one or two findings and/or conclusions you found
especially interesting.

-   

### 1b. Identify a Question(s)

Recall from above that the central question guiding the #COMMONCORE
Project was:

> How are social media-enabled social networks changing the discourse in
> American politics that produces and sustains education policy?

For Unit 4, we are going to focus our questions on something a bit less
ambitious but inspired by this work:

1.  Who are the transmitters, transceivers, and transcenders in our
    Common Core Twitter network?
2.  What subgroups, or factions, exist in our network?
3.  Which actors in our network tend to be more opposed to the Common
    Core?

To address the last question, we'll revisit our techniques we learned
from our Unit 3 VADER sentiment analysis.

#### **👉 Your Turn** **⤵**

Based on what you know about networks and the context so far, what other
research question(s) might ask we ask in this context that a social
network perspective might be able to answer?

In the space below, type a brief response to the following questions:

-   YOUR RESPONSE HERE

### 1c. Load Packages

As highlighted in [Chapter 6 of Data Science in Education Using
R](https://datascienceineducation.com/c06.html) (DSIEUR), one of the
first steps of every workflow should be to set up your "Project" within
RStudio. Recall that:

> A **Project** is the home for all of the files, images, reports, and
> code that are used in any given project

Since we are working from an R project cloned from GitHub, a Project has
already been set up for you as indicated by the `.Rproj` file in your
main directory in the Files pane. Instead, we will focus on getting our
project set up withe the requisite packages we'll need for analysis.

**Packages**, or sometimes called libraries, are shareable collections
of R code that can contain functions, data, and/or documentation and
extend the functionality of R. You can always check to see which
packages have already been installed and loaded into RStudio Cloud by
looking at the the Files, Plots, & Packages Pane in the lower right hand
corner.

#### tidyverse 📦

![](img/tidyverse.png){width="30%"}

One package that we'll be using extensively is {tidyverse}. Recall from
earlier tutorials that the {tidyverse} package is actually a [collection
of R packages](https://www.tidyverse.org/packages) designed for reading,
wrangling, and exploring data and which all share an underlying design
philosophy, grammar, and data structures. This shared features are
sometimes "tidy data principles."

Click the green arrow in the right corner of the "code chunk" that
follows to load the {tidyverse} library as well as the {here} package
introduced in previous labs.

```{r load-tidyverse}
library(tidyverse)
```

Don't worry if you saw a number of messages: those probably mean that
the tidyverse loaded just fine. Any conflicts you may have seen mean
that functions in these packages you loaded have the same name as
functions in other packages and R will default to function from the last
loaded package unless

Next, we will introduce two new packages extend the tidyverse suite of
packages and that we will use throughout SNA Learning Labs 1-4.

### New Packages

#### tidygraph 📦

![](img/tidygraph.png){width="20%"}

The {[tidygraph](https://tidygraph.data-imaginist.com)} package is a
huge package that exports 280 different functions and methods, including
access to almost all of the `dplyr` verbs plus a few more, developed for
use with relational data. While network data itself is not tidy, it can
be envisioned as two tidy tables, one for node data and one for edge
data.

The {tidygraph} package provides a way to switch between the two tables
and uses `dplyr` verbs to manipulate them. Furthermore it provides
access to a lot of graph algorithms with return values that facilitate
their use in a tidy workflow.

#### ggraph 📦

![](img/ggraph.png){width="20%"}

Created by the same developer as {tidygraph},
{[ggraph](https://ggraph.data-imaginist.com/index.html)} -- pronounced
gg-raph or g-giraffe hence the logo -- is an extension of
{[ggplot](https://ggplot2.tidyverse.org)} aimed at supporting relational
data structures such as networks, graphs, and trees. Both packages are
more modern and widely adopted approaches data visualization in R.

While ggraph builds upon the foundation of ggplot and its API, it comes
with its own self-contained set of geoms, facets, etc., as well as
adding the concept of *layouts* to the [grammar of
graphics](https://ggplot2-book.org/introduction.html?q=grammar#what-is-the-grammar-of-graphics),
i.e. the "gg" in ggplot and ggraph.

**readxl 📦**

![](img/readxl.png){width="20%"}

The [{readxl}](https://readxl.tidyverse.org/) package makes it easy to
get data out of Excel and into R. Compared to many of the existing
packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external
dependencies, so it's easy to install and use on all operating systems.
It is designed to work with *tabular* data.

Since one of our data wrangling steps in the next section is importing
network matrices stored in excel files, this package will come in handy.

#### **👉 Your Turn** **⤵**

Use the code chunk below load the {tidygraph} and {ggraph} packages:

```{r load-packages}
# YOUR CODE HERE
library(tidygraph)
library(ggraph)
library(readxl)
```

------------------------------------------------------------------------

## 2. WRANGLE

In general, data wrangling involves some combination of cleaning,
reshaping, transforming, and merging data [@wickham2016r]. As
highlighted in @estrellado2020e, wrangling network data can be even more
challenging than other data sources since network data often includes
variables about both individuals and their relationships.

For our data wrangling in lab, we're keeping it simple since working
with relational data is a bit of a departure from our working with
rectangular data frames. Our primary goals for Lab 1 are learning how
to:

a.  **Import Data**. In this section, we learn about the `read_csv()`
    function for importing a data stored in a format in a unique two
    common formats for storing network data: edgelists and nodelists.

b.  **Create a Network Object**. Before we can create our sociogram,
    we'll first need to convert our data frames into special data
    format, an R network object, for working with relational data.

c.  

### 2a. Import Data

One of our primary goals for this case study to is create . To do so,
we'll need to import two Excel files originally obtained from
the [Social Network Analysis and Education companion
site](https://studysites.sagepub.com/carolan/study/resources.htm). Both
files contain edges stored as a square matrix (more on this later) for
the first and third year of a study examining the impact of national
reform efforts.

These files are included in the lab-1/data folder of your R Studio
project. A description of each file from the companion website is copied
below along with a link to the original file:

1.  [**99472_ds3.xlsx**](https://studysites.sagepub.com/carolan/study/materials/datasets/99472_ds3.xlsx)This
    adjacency matrix consists of **student-reported** friendship
    relations among 27 students in one class in the fall semester. These
    data are directed and unweighted; a friendship tie is present if the
    student reported that another was either a best friend or friend.

2.  [**99472_ds5.xlsx**](https://studysites.sagepub.com/carolan/study/materials/datasets/99472_ds5.xlsx)This
    adjacency matrix consists of the **teacher-reported** friendship
    relations among 27 students in one class in the fall semester. These
    data are directed and unweighted; a friendship tie is present if the
    teacher reported that students were either a best friend or friend.

Recall from above that our relations, or edges, are stored as a
valued [adjacency
matrix](https://en.wikipedia.org/wiki/Adjacency_matrix) in which columns
and rows consist of the same actors and each cell contains information
about the tie between each pair of actors. In our case, the tie is a
directed and valued "arc" where the value indicates the frequency of
collaboration.

Let's use the `read_excel()` function to import
the `student-reported-friends.xlsx` file, add an argument setting the
column names to `FALSE` since our file is a simple matrix with no header
or column names, and assign the matrix to a variable
named `student_friends`:

**R Studio Tip:** Type `?read_excel` into the console and check the
arguments section to examine the different arguments that can be used
with this function.

```{r student-data}
student_friends <- read_excel("data/student-reported-friends.xlsx", 
                              col_names = FALSE)
```

Before importing our teacher reported friendship file, let's quickly
inspect the `student_friends` R object we just imported to see what
we'll be working with.

```{r inspect-students}
student_friends
```

As you can see, we have a 27 x 27
"[tibble](https://tibble.tidyverse.org/)" or data table representing our
collaboration ties. Unfortunately, this data is stored in such a simple
format, we have no way to easily identify who is friends who since our
data is missing names or some kind of identifier for students in our
network.

R has packages for creating random names to help anonymize data, but to
keep things simple, we'll just assign the numbers 1-27 as names for our
rows and columns.

```{r assign-names}
rownames(student_friends) <- 1:27

colnames(student_friends) <- 1:27
```

You may have seen a warning
stating: `Setting row names on a tibble is deprecated.` You can ignore
that for now but it's basically telling us these functions are old we we
need to use newer function or our code will some day stop working.

Again, let quickly inspect our `student_friends` data table to see if
this worked:

```{r view-students}
student_friends
```

Much better! Now we can see that student 1 indicated that student 2 is
their friend, an student 2 indicated that student 1 is their friend, so
we can say that this friendship is "reciprocated."

#### **👉 Your Turn** **⤵**

Complete the code chunk below to import the
`student-reported-friends.xlsx` file,

```{r import-teacher}
# YOUR CODE HERE
teacher_friends <- read_excel("data/teacher-reported-friends.xlsx", 
                              col_names = FALSE)

rownames(teacher_friends) <- 1:27

colnames(teacher_friends) <- 1:27

teacher_friends
```

### 2b. Make a Tidy Graph

Before we can begin exploring our data through through network
visualization, we must first restructure our "tibble" into a formal
matrix object and then convert to a network class R object required by
the {tidygraph} and {ggraph} packages.

#### Convert to Matrix

Now that we have names included for our rows and columns, we need to
convert our data table, or tibble, to a formal matrix class object. To
do so is relatively simple using the `as.matrix()` function built into
R.

```{r convert-matrix}
student_matrix <- as.matrix(student_friends)
```

The word "class" and "object" have been used quite a bit in this
case-study and warrant a brief explanation. Classes and objects are
basic concepts of Object-Oriented Programming environments like R.
An **object** is simply a data structure that has some methods and
attributes. Everything in R is essentially an object. A **class** is
just a blueprint or a sketch of these objects. It represents the set of
properties or methods that are common to all objects of one type.

Let's use the `class()` function on the `student_friends`
and `student_matrix` to see the types of objects we just created:

```{r check-class}
class(student_friends)

class(student_matrix)
```

Great! We can now see that our `student_matrix` is formally an object of
the "matrix" class.

#### **Convert to Graph Object**

Our final step before we're able to begin exploring our data is to
convert our matrix to a network object recognized by the {tidygraph} and
{ggraph} packages.
The [`as_tbl_graph()`](https://tidygraph.data-imaginist.com/reference/tbl_graph.html) function
can easily convert relational data from all common network data formats
such as matrices, `network`, `phylo`, `dendrogram`, `data.tree`,
`graph`, etc. 

Run the following code to convert our matrix to directed network graph
and save as a new object called `student_network`: and include the
argument `directed = TRUE` in our `as_tbl_graph()` function since our
network is directed.

```{r convert-network}
student_network <- as_tbl_graph(student_matrix, directed = TRUE)
```

Now let's take a quick look at our new `student_network` object:

```{r student-network}
student_network
```

As you can see, our `student_network` object provides a range of
information about out network including network size, type, number of
components, and a preview of the node and edge lists that it created.
The node and edge lists are treated just like a typical data frame and
can now be used with other tidyverse packages and functions to create
new actor-level network variables like degree, reciprocity, and
centrality measures.

#### What is an edge list? 

We'll learn more about edgelists in Lab 3, but the **edgelist** format
is very commonly used in network analysis but is slightly different than
other formats you have likely worked with before. Specifically, the
values in the first two columns of each row represent a dyad, or tie
between two nodes in a network. An edge-list can also contain other
information regarding the strength, duration, or frequency of the
relationship, sometime called **weight**, in addition to other "edge
attributes."

In directed networks like ours, the first column indicates that student
1 indicated students 2, 4, and 5 are friends. Since our network is
unweighted, the 1 for "weight" just indicated that a friendship was
present.

#### **👉 Your Turn** **⤵**

Complete the code chunk below to convert your `teacher_friends` object
first to a matrix and then to a network object:

```{r}
# YOUR CODE HERE
teacher_matrix <- as.matrix(teacher_friends)

teacher_network <- as_tbl_graph(teacher_matrix, directed = TRUE)

teacher_network
```

Now answer the questions that following questions:

1.  How many students are in our network?
    -   YOUR RESPONSE HERE
2.  Who reported more friendships, teachers or students? How do you
    know?
    -   YOUR RESPONSE HERE

------------------------------------------------------------------------

## 3. EXPLORE

As noted in in our course readings, exploratory data analysis involves
the processes of describing your data (such as by calculating the means
and standard deviations of numeric variables, or counting the frequency
of categorical variables) and, often, visualizing your data prior to
modeling.

In Section 3, we use the {tidygraph} package for retrieving network
descriptives and introduce the {ggraph} package to create a network
visualization to help illustrate these metrics. Specifically, in this
section we'll learn to:

a.  **Plot Basics**. We focus primarily on actors and edges in this
    walkthrough, including the edges wights we added in the previous
    section as well as node degree, and import and fairly intuitive
    measure of centrality.

b.  Adding Nodes.

c.  **Adding Edges**. Finally, we wrap up the explore phases by learning
    to plot a network and tweak key elements like the size, shape, and
    position of nodes and edges to better at communicating key findings.

One of the defining characteristics of the social network perspective is
its use of graphic imagery to represent actors and their relations with
one another. To emphasize this point, @carolan2014 reported that:

> The visualization of social networks has been a core practice since
> its foundation more than 100 years ago and remains a hallmark of
> contemporary social network analysis. 

Network visualization can be used for a variety of purposes, ranging
from highlighting key actors to even serving as works of art.

This excellent figure from Katya Ognyanova's also excellent tutorial on
[Static and Dynamic Network Visualization with
R](https://kateto.net/network-visualization/) helps illustrate the
variety of goals a good network visualization can accomplish:

![](img/viz-goals.jpeg){width="80%"}

### 3a. Simple Sociograms 

These visual representations of the actors and their relations, i.e. the
network, are called a **sociogram**. Actors who are most central to the
network, such as those with higher node degrees, are usually placed in
the center of the sociogram and their ties are placed near them. As
we'll see in just a bit, those two actors with hundreds of ties will be
placed by most graph layout algorithms in the center of the graph.

The `plot()` function from R's built in {graphics} package can be used
to make a wide range of graphs, including sociograms, but as you'll see
it's a bit lacking and is limited limited in the level of customization
allowed.

In the code chunk below, use the `plot()` function with your
`ccss_network` object to see what the basic plot function produces:

```{r plot-network}
plot(student_network)
```

Not super great. In fact, it's visualizations like these that give
sociograms the unflattering nickname of "hair ball" plots!

If this had been a smaller network this might have been a little more
useful but one important insight is that we have already identified an
"isolate" in our network, that is a student who neither named others as
a friend or was named by others as a friend.

Fortunately, the {ggraph} package includes a plethora of plotting
parameters for graph
[layouts](https://ggraph.data-imaginist.com/articles/Layouts.html),
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) and
[nodes](https://ggraph.data-imaginist.com/articles/Nodes.html) to
improve the visual design of network graphs.

Let's first take a quick look the `auto_graph()` function for making
quick and simple sociograms.

```{r auto-graph}
autograph(student_network)
```

A little better, but also lacking in many important ways. Like the
`plot()` function, it does allow some small degree of customization, but
is still rather limited and best use for very quick sociograms to get a
quick feel for the data.

Run the following code chunk to see some additional arguments you can
add to the `autograph()` function:

```{r student-sociogram}
autograph(student_network,
          node_size = local_size(),
          node_label = name,
          node_colour = local_size())
```

#### **👉 Your Turn** **⤵**

Use the code chunk below to try out these simple sociogram functions on
your `teacher_network` object you created above:

```{r teacher-sociogram}
# YOUR CODE HERE
plot(teacher_network)

autograph(teacher_network,
          node_size = local_size(),
          node_colour = local_size())
```

### 3b. Sophisticated Sociograms

One thing to keep in mind when building a network viz with {ggraph}, is
that just like it's `ggplot()` counterpart, the `ggraph()` function is
the first function required and takes care of setting up the plot object
along with creating the layout for the plot based on the network object
and the layout specification provided.

Let's first pass our `student_network` object to `ggraph()` and see what
happens.

```{r}
ggraph(student_network)
```

Wow, that was unimpressive. But don't worry, just like the `ggplot()`
function, this didn't produce much on it's own. All that the `ggraph()`
function does is set up the network object to make a sociogram, and
creates a layout for our network, in this case using the default
"stress" layout.

#### Add Nodes

Very similar to how `ggplot()` uses the `+` operator to "layer"
functions together to progressively build more sophisticated graphs,
`ggraph` use the `+` operator progressively build a sociogram.

To add our nodes, we'll added the `geom_node_point()` function. Again,
just like with {ggplot2}, the "geom" in the `geom_non_point()` functions
stands for "Geometric elements", or geoms for short, and represents what
you actually see in the plot.

#### **👉 Your Turn** **⤵**

Now "add" the `geom_node_point()` function to our code using the `+`
operator:

```{r}
ggraph(student_network) + 
  geom_node_point() 
```

Well, at least we have our nodes now! But the default "stress" layout
for our sociogram is not so great. Let's fix that.

#### Add Layout

One of the major advances in visualization since the first hand-drawn
sociograms developed by Jacob Moreno (1934) to represent relations among
children in school is the use of software and algorithms to
automatically layout networks on a grid.

There are may different [layout
methods](https://en.wikipedia.org/wiki/Graph_drawing#Layout_methods),
but we'll start with the Fruchterman-Reingold (FR) layout, which is one
of the most used layout algorithms for network visualization. These
types of force-directed algorithms generally work well with large
networks and try to layout graphs in "an aesthetically-pleasing way" by
making edges roughly equal in length and minimizing overlap.

Let's go ahead and include the layout argument, which in addition to
including its own unique layouts, can incorporate layouts form {igraph}
package like `fr` for the Fruchterman-Reingold (FR) layout:

```{r}
ggraph(student_network, layout = "fr") +
  geom_node_point()
```

That's not much better so let's stick with the "stress" layout for now.
Feel free to try out some other [ggraph layout
methods](https://ggraph.data-imaginist.com/articles/Layouts.html) if you
like, however.

#### Tweak Nodes

Also like {ggplot2}, geoms can include aesthetics, or aes for short,
such as `alpha` for transparency, as well as `color`, `shape` and
`size`.

Let's now add some "aesthetics" to our points by including the `aes()`
function and arguments such as `size =` and `color =`, which set using
`local_size()` function to help highlight the number of friends students
have:

```{r add-color}
ggraph(student_network, layout = "stress") + 
geom_node_point(aes(size = local_size(),
                    color = local_size()))
```

We can easily see that the number of friends ranges from 5 to 20, with
the exception of one "isolated" student we identified earlier who is not
connected to any other students in the network.

Let's fix that by adding another layer with some node text and labels.
Since node labels are a geometric element, we can apply aesthetics to
them as well, like color and size. Let's also include the `repel =`
argument that when set to `TRUE` will avoid overlapping text.

```{r add-labels}
ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name,
                     size = local_size()), # note we can treat this like a number
                 repel=TRUE)
```

#### Add Edges

Now, let's literally connect the dots and add some
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) using the
`geom_edge_link()` function.

```{r}
ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link()
```

Ack! Without some adjustment, the edges make it really difficult to see
the nodes. Fortunately, you can also adjust the edges just like we did
to the nodes above: Let's now include the following arguments:

-   `arrow =` to include some arrows 1mm in length

-   `end_cap =` around each node to keep arrows from overlapping the
    them, and to

-   `alpha = .2` set the transparency of our edges so our edges fade
    more into the background and help keep the focus on our nodes:

```{r tweak-edges}
ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2)
```

#### Add a Theme

Finally, let's add a **theme,** which controls the finer points of
display, like the font size and background color. The `theme_graph()`
function add a theme specially tuned for graph visualizations. This
function removes redundant elements in order to put focus on the data
and if you type `?theme_graph` in the console you will get a sense of
the level of fine tuning you can do if desired.

Let's add `theme_graph()` to our sociogram, remove the legends since
they are not especially useful, and call it good for now:

```{r add-theme}
ggraph(student_network, layout = "stress") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) +
  theme_graph()
```

Much better!

**Note:** If you're having difficulty seeing the sociogram in the small
R Markdown code chunk, you can copy and paste the code in the console
and it will show in the Viewer pan and then you can enlarge and even
save as an image file.

#### **👉 Your Turn** **⤵**

Now that you have a sense of how the {ggraph} package works to build
network graphs, use the code chunk below and try building sophisticated
sociogram for the `teacher_network` object that you created above.

There are no right or wrong answers, just have some fun trying out
different options for graph
[layouts](https://ggraph.data-imaginist.com/articles/Layouts.html),
[edges](https://ggraph.data-imaginist.com/articles/Edges.html) and
[nodes](https://ggraph.data-imaginist.com/articles/Nodes.html) and see
if you can build something that is visually pleasing to you.

```{r your-sociogram}
ggraph(teacher_network, layout = "fr") + 
  geom_node_point(aes(size = local_size(),
                      color = local_size())) +
  geom_node_text(aes(label = name),
                 repel=TRUE) +
  geom_edge_link(arrow = arrow(length = unit(1, 'mm')), 
                 end_cap = circle(3, 'mm'),
                 alpha = .2) +
  theme_graph()
```

Congrats! You made it to the end of the EXPLORE section!

------------------------------------------------------------------------

## 4. MODEL

As highlighted in [Chapter 3 of Data Science in Education Using
R](https://datascienceineducation.com/c03.html), the **Model** step of
the data science process entails "using statistical models, from simple
to complex, to understand trends and patterns in the data." We will not
explore the use of models for SNA until Lab 4, but recall from the
PREPARE section that to assess agreement between perceived friendships
by the teacher and students, [@pittinsky2008behavioral] note that:

> **The QAP (quadratic assignment procedure)** [is] used to calculate
> the degree of association between two sets of relations and tests
> whether the probability of dyad overlap in the teacher matrix is
> correlated with the probability of dyad overlap in the student matrix.
> It does so by running a large number of simulations. These simulations
> generate random matrices with sizes and value distributions based on
> the original two matrices being tested.

We will learn more about the QAP and other models for statistical
inference when working with relational data in Learning Lab 4.

------------------------------------------------------------------------

## 5. COMMUNICATE

Our goal is to distill the analysis from above into a simple "data
product" designed to illustrate key findings about changes in the
collaboration network over time. For the purposes of this task, imagine
that your audience consists of teachers and school leaders who have
limited background in SNA and adapt the following steps accordingly:

1.  **Select.** Select our sociogram from above, or create a entirely
    new sociogram if so motivated, that you think would be interesting
    or relevant for the target audience and that helps answer our
    research question.

2.  **Polish.** Create a visually attractive sociogram to help
    illustrate similarities and differences in classroom friendships
    reported by teachers and students.

3.  **Narrate.** Write a brief narrative to accompany your visualization
    and/or table that includes the following:

    -   The question or questions guiding the analysis;

    -   The conclusions you've reached based on our findings;

    -   How your audience might use this information;

    -   How you might revisit or improve upon this analysis in the
        future.

### 👉 Your Turn ⤵

Use the code chunk below create a polished table and/or visualization(s)
and write a brief narrative in the space that follows.

### Data Visualization or Table

```{r create_data_product}
# YOUR CODE HERE


```

### Narrative

NARRATIVE GOES HERE...

### 🧶 Knit & Check ✅

Congratulations - you've completed the Lab 4 case study! One final step
is to "Knit" your document by clicking the drop down arrow next to the
ball of yarn in the menu bar an that the top of this markdown file, and
then selecting "Knit top HTML" or another preferred output format. This
will do two things: 1) it will check through all your code for any
errors, 2) it will created a file in your directory that you can use to
share you work through [GitHub Pages](https://pages.github.com),
[RPubs](https://rpubs.com/about/getting-started), or any other preferred
means.

### References
